Method and system for the graphical modeling of data and calculations of a spreadsheet

ABSTRACT

A method and system for the graphical modeling of calculations in a spreadsheet which generates a graphical representation of formulas and relevant data contained in a spreadsheet. This graphical representation consists of tree or graph structures. Any changes made to the trees or spreadsheet cells can subsequently be translated to the corresponding trees and/or cells. A partial resolution of the calculations represented by a tree may be analyzed at specific breakpoints positioned at any node within the tree. Entirely new trees can be created by a user to represent new formulas. A user can choose to have the system translate these new trees into equivalent expressions based on the spreadsheet&#39;s native formula syntax and populate the appropriate corresponding cells. The present invention provides a means of visualizing, editing and constructing complex relationships between formulas, functions and data to improve overall spreadsheet use.

CROSS-REFERENCE TO RELATED APPLICATIONS

Provisional Patent Application: 60/654,641 Filed: Feb. 18, 2005

Provisional Patent Application: 60/705,888 Filed: Aug. 5, 2005

FEDERALLY SPONSORED RESEARCH

None

SEQUENCE LISTING

None.

BACKGROUND

The present invention relates to computer based calculations and more specifically those performed by programs commonly known as “spreadsheets”. Spreadsheets provide a means of organizing related sets of dynamic calculations called formulas into a meaningful collection of data and have common application in such fields as engineering, finance and science. Spreadsheets are comprised of a series of horizontal and vertically aligned adjacent regions referred to as cells. Within these cells a user can enter formulas in what is called infix notation. Infix notation consists of two operands connected together by one binary operator as shown below. $\begin{matrix} 2 & + & 4 \\ \hat{} & \hat{} & \hat{} \\ {operand} & {operator} & {operand} \end{matrix}$ This is the notation in which we are taught to think of algebraic expressions in grade school. Once the calculation or formula is entered into a cell the spreadsheet program can be set to resolve the equation and display the result within the cell. Furthermore, these cells can themselves be part of other formulas contained in separate cells, in which case the result of one cell is calculated as an operand of a second cell.

For the remainder of the text it will be implied that cells are arranged in columns notated with letters and rows notated with numbers starting with A and 1 respectively. As such, the expression below adds the contents of the cell found in the first column and first row to the constant integer 2. A1+2 If the cell (A1) referenced in the expression above contains a formula then the resolution of that formula is replaced with its reference in the expression and the first formula is resolved. By applying this system the spreadsheet application can be used as an efficient means of structuring inter-related formulas to perform various complex calculations. Often these calculations take on a somewhat syntactically cryptic form providing little insight into what is actually being computed, such as the example below also shown in FIG. 2. (2+((A5+2)*4)−(C2*(2+4))) This expression is not an overly complicated algebraic form. However, its clarity is hindered by the fact that the user has no direct means of knowing whether the referenced cells contain constant values, and if so what they might be, or if they contain further calculations taking place elsewhere from within a separate formula of the referenced cell. The problem is compounded by the fact the formulas are often not clearly viewed all at once, since they are spread throughout the cells in various physical locations and often found in entirely separate screen views or files commonly called sheets or workbooks respectively. The present invention proposes a means of rectifying this problem by providing a graphical and symbolic representation of the calculations contained in the cells.

DEFINITIONS

The following is a list of words used throughout this application and their associated meaning as it pertains to the present invention.

-   Area: Any single cell or continuous block of adjacent cells. -   Filled Area: An area which contains value data (non-empty). -   Empty Area: An area with no value contents. -   Formatted Area: A filled or empty area in which all member cells     contain similar formatting such as patterns, shading, cell color,     borders, and/or specific fonts. -   Function: A set of computational steps used to provide an output     based on a given input invoked through a mnemonic identifier and     list of input values. -   Range: A single cell, group or groups of continuous or     non-continuous cells in a spreadsheet. -   Named Range: A range which is assigned a mnemonic identifier from     which it can be referenced. -   Formatted Range: A range in which all member cells contain similar     formatting such as data type, patterns, shading, cell color,     borders, and/or specific fonts. -   Input Range: Any cell, area or range whose contents can be used as     input for formulas and or functions. -   Relative Address: The address of a cell based on the relative     position of the cell and the cell of a referencing formula. If the     formula is copied, the relative reference changes to retain a     consistent relative position. -   Absolute Address: Is an exact address of a cell regardless of the     position relative to the cell of the formula. -   Dynamic Content: Any cell, area or range that contains data whose     value may change dependent upon operations, functions or other data     and spreadsheet contents e.g. formulas and references. -   Tree: A hierarchical structure in which parent nodes are connected     to one or more child nodes to establish precedence within a set.     Often used in mathematics and computer science. -   Graph: Somewhat like a tree; however, nodes are connected within a     more chaotic network in which recursive relationships can be     represented. -   Node: Element of a tree or graph structure. The graphical     representation of a node may be depicted with specific physical     properties such as shape, size and/or color which denote the data     type of the value they represent. -   Discovery: The algorithmic process of comparing data against a set     of pre-determined parameters to establish meaningful insight into     the properties of such data. -   Automatic Expansion: An algorithmic process of adding new cells,     area or ranges into set or group based on similarities between the     newly added cells and the existing group.     Such similarities can be comprised of similar data types, similar     formulas, similar formatting, location, or any combination thereof.

SUMMARY

Existing spreadsheet applications provide a method for working with formulas based on a modified infix notation, presented through a textual representation. The present invention provides a means to gather and convert information pertaining to such textual representations into graphical representations. The present invention can keep both the graphical representation and textual representations functionally equivalent by translating changes made to one into the appropriate changes to the data defining the other. In doing so, the present invention effectively provides a graphical interface for working with formulas within a spreadsheet.

DESCRIPTION OF DRAWINGS

FIG. 1: Is an example of a graphical representation corresponding to the formula shown in textual notation in FIG. 2.

FIG. 2: Is an example of a formula expressed in the textual notation commonly used within a spreadsheet.

FIG. 3: Shows an example of a tree generated by the present invention from the simple spreadsheet formula IF(1,2,3).

FIG. 4: Shows an example of a compound tree generated by the presented invention, by combining the tree generated from a referenced cell formula with the tree generated from the dependent formula.

FIG. 5: Shows an example of a tree generated by the present invention from a formula containing a function. In this case the function is MEDIAN(1,2,3) which finds the median value of the given input values of (1,2, and 3).

FIG. 6: Shows a tree as generated by the present invention which represents a compound type tree with a variety of input nodes. The reference node labeled as Field 12 corresponds to an area reference which represents a group of continuous adjacent cells located between cell D10 and cell E14. As such, the child nodes consist of any contents within the area. In this case, they consist of a reference, a value, and a function.

FIG. 7: Shows a collection of trees generated by the present invention from several separate formulas which all share a similar computational structure.

FIG. 8: Shows a template tree generated by the present invention to provide a common functional interface for the trees and underlying formulas of FIG. 7.

FIG. 9: Shows an example of the partial resolution display provided by the present invention. The partially resolved value is displayed as @:=72 which is the value obtained by solving all the operations of underlying formula in proper precedence up to and including the operator of the node selected with the cursor arrow.

FIG. 10: Shows the model map display provided by the present invention.

FIG. 11: Illustrates an example of building a formula with static range nodes.

FIG. 12: Illustrates an example of building a formula with dynamic content trees.

DESCRIPTION

The preferred embodiment of the present invention consists of computer software. As such, the software can be highlighted based on an overview of core computational components. The following provides an outline of these core components and their sub categories; all of which shall be described in further detail below, from both an implementation and usability standpoint.

I. Operational Functions:

1) Analyze/Discovery

-   -   a. Dynamic Content     -   b. Input Ranges     -   c. Names

2) Build/Modify/Design

3) Synchronize

II. Algorithmic Processes:

1) Formula Compare/Compression

2) Name Discovery

III. Views:

1) Dynamic Content View

2) Input Range View

3) Combined Dynamic Content and Input Range View

4) Additional Information Views

I. Operational Functions:

1) Analyze/Discovery

The analyze/discover component of the present invention relates to the ability to gather meaningful information from a spreadsheet. This is accomplished via a link to the spreadsheet application or an associated spreadsheet file. Many such links are common through information sharing routines of computer software and are well known to those familiar with the art. It should be noted that the present invention could also be embodied as a component of the spreadsheet application itself. In the second embodiment the link can be considered an internal routine of the spreadsheet application. The process of selecting relevant sections of a spreadsheet from which to read may be accomplished through the selection from a list of available sources by user input. As such, a user may select a source consisting of an area, range or entire sheet within a spreadsheet or several separate spreadsheet files to create a link. Once the link has been established to the requested source, information is gathered from the spreadsheet and analyzed as follows.

a. Dynamic Content

All dynamic content such as formulas and references are retrieved from the source and displayed as graphical structures referred to as trees e.g. as shown in FIG. 1. The tree in FIG. 1 is the graphical representation of the formula shown in common infix notation in FIG. 2. In FIG. 1 the rectangular node at the top is called a root node and displays information concerning the output cell of the formula. The other nodes within a tree represent computational operators and operands. These nodes can use physical properties such as shape, size and/or color to relay information about the intent or functionality of their contents. When presented in a tree or graph structure it is possible to relay precedence within a formula without the use of parenthesis. This format can be useful to unmask a layer of complexity involved with deciphering a corresponding infix expression.

The present invention can display information about basic mathematical operations of formulas such as in FIG. 1 and also describe more complex relationships such as those created through the use of functions and the combination of functions and mathematical operations within formulas. For the purpose of the present invention a cell containing only a function is also considered a formula. Functions are commonly used by spreadsheets to provide decisional behavior or conduct complex sets of computations to derive a specific output based on a given input. In the tree of FIG. 3 the simple function IF(1,2,3) is represented. The native spreadsheet syntax of IF(1,2,3) is interpreted as follows: The first input operator is the condition, the second is the output if the condition is true, and the third is the output if the condition is false. In FIG. 3, the tree successfully portrays the decisional branching conditions of the function within the formula and the role of the dependent operators (input). The representation is not only limited to decisional functions but can also relay information about unary or multiple input functions and the relationship to their associated operators (input). FIG. 5 shows a tree which represents a formula containing a computational function MEDIAN(1,2,3) which finds the median of the given input values, in this case three values (1,2, and 3). It is also just as feasible for the present invention to represent unary operations such as increments or decrements also used in spreadsheets. A graphical depiction of a unary operator in a formula would include an operator node connected to a single operand node. In effect, the number of child nodes that a particular operator node possesses is based upon the function of the operator, while the number of nodes within the entire tree is dependent on the function of the formula as a whole. The tree depicted in FIG. 6 represents a compound type tree with a variety of input nodes. A specific note of interest concerning FIG. 6 is the reference node labeled as Field 12. This reference node corresponds to an area reference which represents a group of continuous adjacent cells. As such, the child nodes consist of any contents within the area. In this case, they consist of a reference, a value, and a function. This type of view can easily become crowded with information if the area spans a large number of cells in the spreadsheet. For this reason, the user is able to select whether to show or hide all or certain child nodes of any type of reference node as appropriate for the current task. For instance, the user may hide all nodes or choice to view only those nodes which correspond to references, additional formulas, and/or specific data types.

For the purposes of the present invention a cell containing only a reference to another cell is also considered a formula. Such a tree would be represented by a root node displaying the output cell and a reference node connected to the root node. This reference node may also contain a child node displaying the actual contents of the referenced cell. By representing the contents of the referenced cell in a separate child node, physical properties of the child node such as shape, size and/or color may be used to denote the data type, intent or functionality of the cells contents. It should be noted that any node can use such physical properties to relay information about its contents or intent. Many times one or more operands found in a formula are references to other cells. Sometimes these reference operands point to a cell which contains another formula. In this case the present invention proposes to connect, relationally and graphically, the tree which represents the original cells formula along with any trees representing referenced formulas contained in separate cells. An example of such a compound tree is shown in FIG. 4. The cell references C5 and A1 in the tree of FIG. 4 contain formulas, therefore trees representing these formulas are added as child nodes or sub-trees of the tree based on the formula in the root node cell B8. By providing a view into the broader relationships between formulas the user can easily get a complete picture of how each component will affect the output of the original formula.

Sometimes spreadsheets will contain links to data outside the file. This data may reside in a separate spreadsheet not identified in the source or even a database table or web page. In this case, the present invention provides a display with information that allows the user to distinguish these sources within nodes of a tree and quickly identify their origin. Such information may consist of nodes with a shape specifically denoted to such external links and textual identifiers of the external source.

b. Input Range

The present invention also gathers information pertaining to input ranges. Input ranges consist of a cell, area, range or group thereof which currently or may potentially contain data that can be used as input for formulas and or functions. Input ranges can correspond to cells, areas or ranges which contain static hard coded type values, values which represent the output of a formula and or function, or which are currently empty. The process of discovery for input ranges may include combining ranges into specific groups. The characteristics properties which constitute an input range and the grouping thereof can be based upon user selected options. Such options may include selecting numeric data only, group areas or ranges based on similar formatting properties (e.g. patterns, shading, cell color, borders, fonts), group by continuous columns of cells only, group by continuous rows of cells only, group blocks of all continuous cells, group continuous only if cell values are similar data types, group by areas and not ranges, group by named ranges, or group by current user selected area within the spreadsheet, or any combination thereof. The cells included in a discovery need not contain values they can just as easily be empty cells or cells with no current value data. Including empty cells within an input range is useful when creating a sheet in which further data or input will be entered after the structure has been implemented.

After an input range is established through discovery or user interaction, the user may select to assign automatic expansion to the input range. By assigning the automatic expansion property to an input range any updates made to cells which creates a cell that matches the grouping criteria for the input range will cause the new cells to be added to the existing input range group, consequently updating any corresponding nodes in the display accordingly. Automatic expansion can be set by the user to consider cells that are continuous or non-continuous in relation to the existing input range.

Once a group of input ranges is established each group is depicted as a single node corresponding to the output cell(s) of the input ranges. Just as with other nodes, the input range nodes may use physical properties such as shape, size and/or color to denote data types, intent, or functionality of the underlying content. Since the ranges may consist of single cells, areas, ranges or any number and combination thereof, the output location can be a single location or a range depicted by the highest right most cell followed by the lowest left most cell within each range contained within the group as is a common method of denoting a continuous block of adjacent cells. If the input range consists or two or more separate continuous blocks of adjacent cells which together are not continuous each of these areas can be listed in the format of the previous sentence and displayed in association with the node.

c. Names

Both dynamic content and input range groups can be assigned names. The present invention assigns names by displaying a text string within the nodes that represent such named components. By labeling these nodes with a meaningful names the intent and purpose of these components is communicated clearer making it easier to build and understand formulas and the underlying spreadsheet structure. Determining the names for these nodes can be accomplished via user input in which the user directly assigns a name or by means of name discovery. Name discovery involves the process of determining what data within the source sheet cells properly identifies dynamic content and input range groups. The actual algorithmic process of name discovery will be disclosed below.

2) Build/Modify/Design

Within the Build/Modify/Design component the user is able to interact with the nodes within the display to arrange or change existing trees or build new trees. The specific means in which the user can build new models is described in further detail within the operation section below.

3) Synchronize

The synchronize component allows the user to keep spreadsheet cells and trees consistent with each other during changes made to either the cell contents or the trees. This is accomplished through a real time push and pull of changes as they occur or user interaction depending on the settings selected by the user. The user can select to synchronize all, one or a selected group of cells or trees. The synchronization method can consist of creating the infix equivalent of an updated tree or vice versa and then updating the contents of the cell or tree accordingly. Such methods of converting a tree to infix notation and vice versa are well known in the art. Updating of the cell and tree is communicated by the link to the source as described at the beginning of this description section.

II. Algorithmic Processes:

1) Formula Compare/Compression

Formulas within a spreadsheet(s) often share a common underlying structure. Such can be found when a formula is copied and pasted down a column or row in the form of what is referred to as a formula array. It is also possible that two or more formulas may share a common structure but reside in completely separate locations and not be considered part of a formula array. In either case the present invention allows these formulas which share a common structure to be combined or compressed into one generic tree which represents the calculations of all formulas in the set. This allows the user to view this structure only once and hide information which may be considered repetitive for certain purposes. Another added benefit is the ability to make changes to the compressed generic formula tree. Once the user makes changes to this generic tree they can choose to push or cascade these changes to all member formulas. This is particularly useful since the present method of re-using a formula within an area consists of the following steps: change a formula, manually locate all similar formulas and then copy and paste the updated formula to all desired old formula cells. The last two steps of the present method are especially error prone and lead to inconsistent formula updating. An example of a collection of trees sharing a common underlying structure is shown in FIG. 7 along with their corresponding generic tree shown in FIG. 8.

The present invention allows the user to select the parameters to consider or omit when performing the formula compression which may include similar data types, similar relative or absolute address locations, contents of range or area node, or input of functions. By comparing or omitting these differences and similarities the user can determine the threshold for combining two formula trees into one generic tree. Methods of comparing and compressing trees can consist of traversing two or more trees and comparing the nodes of each tree at similar points. Such methods and algorithms of tree traversal and data comparison are well known in the art.

The user is also free to expand a compressed generic tree to view the representation of any or all member trees. By expanding a compressed generic tree and exposing the member trees, the user is also free to make changes to individual member trees. Such changes may subsequently lead to the exclusion of these individual trees from the generic tree set.

2) Name Discovery

Besides user input of names for nodes names can be assigned through name discovery. The two major methods are applying named ranges to nodes with identical address contents and format recognition. Applying named ranges involves simply comparing the address of a named range to the address of a node's corresponding cells. If both match then the named range label is added as a label to the node. Format recognition is somewhat more complex. This involves determining what data within the source most likely was added to describe the contents of the node to name. Such methods may include looking for the first text string to the right and or above a single cell range and concatenating these strings to name that range. In the case of naming a node referring to a single vertical column range the first text string above the range might make the best choice. Choosing the first text string to the right could apply when naming a node representing a single horizontal row range. Determine which rules to apply can be a combination of pre-defined settings along with user selected options and can be accomplished within the present invention using well known programming routines.

3) Partial Resolution

The partial resolution component of the present invention consists of a process for evaluating a portion of the calculations within a formula based on a selected node. This allows the user to view the results of all the calculations performed by all children nodes of a particular node within a formula. Such functionality provides a method for “stepping through” the operation of a formula in an effort to build understanding or debug errors of the formula. The implementation of partial resolution within the present invention can consist of traversing all child nodes of the selected node and performing the corresponding calculations. Such methods of traversal and calculation are well known algorithms and routines of the art. Actual user interaction with the partial resolution component is described further in the operation section below.

III. Views

1) Input Range View

The input range view allows the user to view all input range nodes generated during the discovery process. One important aspect of the input range view is the ability to represent dynamic content in terms of their current output values. For instance, the present invention provides a view of dynamic content in the form of a root node containing the currently present output value of the formula in the underlying cell along with the standard labeling and location text. This is comparable to the normal operational view of a spreadsheet, in which the underlying formulas of the dynamic content are hidden from the users view and only the resolved values are displayed. Such a view relays an output value and the location of that value. This makes it intuitive for the user to apply the resolved values as input to other formulas and functions without being concerned about the underlying mechanics. The same principle applies to the present invention. A “static like” view of the dynamic content is provided so the user can quickly acknowledge these resolved values when building new formulas and planning overall spreadsheet design.

2) Dynamic Content View

The dynamic content view allows the user to view a display of all trees representing dynamic content. Within the dynamic content view the user can select options which affect the view of the dynamic content trees. For instance, the user can select to show separate trees for all formulas regardless if the formula is already being represented through a reference node of a separate tree. By selecting to separate out referenced formulas the user can see all formulas as individual trees. On the other hand, when combining the view of referenced formulas with the referring formula the user can view a complete picture of the process which leads to the final output of the referring formula. The user may also hide or unhide the nodes contain in large collections such as within an area reference node. An area reference can refer to a large group of cells. By showing or hiding children nodes of an area reference node the user can establish a view which is meaningful to their current task without overloading the display with unnecessary information.

3) Combined Dynamic Content and Input Range View

The user may also choose to view the dynamic content together with input ranges. This allows the user to build formulas by combining dynamic content with input ranges, the process of which is described in the operation section below.

4) Additional Information Views

It should be noted that in addition to the core views described above information can be presented to the user in the form of textual lists, icon lists, tree lists, dialog boxes, and pop-up style overlays outside the contents of nodes. The information displayed with these methods may coincide with the information present in nodes and the views above and can include additional information as well. The information in these additional views may include node names, cell locations, relative and absolute address information, formula or function errors, calculation values, etc. By providing an alternate method of navigating information the present invention allows the user to reduce information in the nodes and core display views to minimum in an effort to simplify the view for the building and modeling processes. While the user may keep the model information minimized, the additional information views provide the ability to “drill down” to reveal further relevant details or view information which would not physically fit within the dimensions of the nodes or core display views.

OPERATION

The user interacts with the present invention through the use of a graphical user interface. Primary functionality such as Analysis/Discovery, Synchronization and changing of Views consists of selecting operations and setting values via a toolbar or menu options. A user operation such as building formulas consists of more specific interaction. The user can build a formula from nodes using a “drag and drop” style to move one root node on top of another root node. Any of these nodes can correspond to input ranges or dynamic content and both types are completely interoperable. Once two root nodes are dropped together the user may select an operator or function from a toolbox to specify the relationship. FIG. 11 shows a drag and drop type build of a simple formula from input range nodes while FIG. 12 shows a similar build using dynamic content nodes. Again the user can just as easily build a formula by combining input range nodes with dynamic content nodes since they are interoperable. If the selected operator or function expects additional input nodes the user can be prompted visually with new branches connected to a graphical node depicting an empty position. These empty position nodes are used as a visual aid to prompt the user for the intended data type. For instance the shape of the empty node could be consistent with the expected data type, but a separate standard color for empty position nodes may be applied. In this way the user can quickly determine what type of root nodes can be inserted and identify which nodes are still empty. For purposes of determining input position of nodes different methods can be applied (e.g. the last node selected by the cursor could be set as the leftmost input, or if a node is dropped to the right of the root node of the tree it goes to the far right or far left, or if the new node is dropped between two nodes the newly dropped node is inserted in between. When dragging and dropping a left click and hold may be used to move a single node in a tree and a right click and hold may allow the movement of the all connected nodes of a tree in unison. The user could also initiate building a new formula by selecting a root node and an operator from the toolbox. In this case the root node may become the right or leftmost operator with empty node graphics displaying expected input so that the proper number of additional nodes may be dropped in as before. In some cases operators can accept an arbitrary number of inputs. In this case the user is free to add as many nodes as desired.

Many properties of formulas are also accessible by clicking on a specific node. Once a specific node is selected a user can quickly change cell contents, cell references, or operator of that node. Another method of managing the properties and contents of nodes consists of using a dialog box form to set values and properties of the nodes. A dialog box form is useful to allow the user to see many properties at once and work with textual names in a font size larger then that which will fit within a node. Another useful feature for navigating and comprehending a spreadsheet as a whole consists of the model map FIG. 10. This is an area of the display which is divided into cells of a grid representing the all sheets participating within the current target link. Each cell in the model map grid corresponds to a cell within a sheet of the target link. The cells of the grid are made small in size so the user can view many cells at once. If there are many sheets within the current target link the user can scroll horizontally to view additional sheets. The user can also scroll horizontally and vertically within in sheet view to navigate the entire contents of a sheet. Each cell in the formula map grid is highlighted in a color which corresponds to a color tagged on the tree representing the dynamic content (formula, generic formula, or reference) or input contained in that cell within the spreadsheet. If no dynamic content or input ranges are present in a cell it is not shaded. This allows the user to quickly find out where specific the dynamic content, trees, formulas, generic formulas, and input ranges reside. The current contents displayed in the model map can be toggled by the user to show dynamic content only, input ranges only or both combined.

Another useful feature contained in the operation of the present invention is partial resolution or the ability to view the resolution of a formula at any node within the tree. The user would invoke such behavior by placing the cursor over a node. In this mode the present invention will display a small overlay containing the value of the formula based upon all child operand and operator nodes below. This is particularly useful for tracing through specific points of formulas to find errors or analyze behavior. FIG. 9 shows a screen shot of a partial resolution scenario.

It should be noted that the descriptions provided herein represent one embodiment of the present invention. As such, other embodiments, changes, modifications to the proposed embodiment may be provided which remain within the scope and spirit of the present invention. 

1) A method for analysis of computation. a) Wherein said computation consists of spreadsheet formulas. b) Wherein said analysis consists of a graphical depiction. c) Wherein said formulas contain operators and operands. d) Wherein said operators contain relationships to said operands. e) Wherein said relationships are represented as tree or graph structures. f) Wherein said tree or graph structures contain nodes. g) Wherein said nodes are depicted by graphical images within a view or display. h) Wherein said nodes represent said operators and said operands. 2) The process of creating said analysis from said formulas. 